Solution: In No Particular Order

Let's test some techniques for random selection in this lesson to solve the antipattern.

We'll cover the following

Choose a random key-value between 1 and MAX
Choose the next highest key value
Get a list of all key values, choose one at random
Choose a random row using an offset
Proprietary solutions

The sort-by-random technique is an example of a query that’s bound to perform a table scan and an expensive manual sort. When we design solutions in SQL, we should be on the lookout for inefficient queries like this. Instead of searching fruitlessly for a way to optimize an unoptimizable query, we need to rethink our approach. We can use the alternative techniques shown in the following sections to query a random row from a query result set. In different circumstances, each of these solutions can produce the same result with greater efficiency than sort-by-random.

Choose a random key-value between 1 and MAX#

One technique that avoids sorting the table is to choose a random value between 1 and the greatest primary key value.

Enter to Rename, Shift+Enter to Preview

Retrieving a random key value using 1 and MAX technique

This solution assumes that:

the primary key values start at 1
the primary key values are contiguous, that is, there are no values unused between 1 and the greatest value. If there are gaps, a randomly chosen value may not match a row in the table.

We use this solution when we know that our key uses all values between 1 and the greatest key value.

Choose the next highest key value#

This is similar to the preceding solution, but if we have gaps of unused values between 1 and the greatest key value, this query matches a random value to the first key value it finds.

Enter to Rename, Shift+Enter to Preview

Retrieving the next random key value using 1 and MAX technique

This solves the problem of a random number that misses any key value, but it also means that a key value that follows a gap is chosen more often. Random values should be approximately even in distribution, but bug_id values aren’t.

We use this solution when gaps are uncommon and when it’s not important for all key values to be chosen with equal frequency.

Get a list of all key values, choose one at random#

We can also use the application code to pick one value from the primary keys in the result set. Then, we can query the full row from the database using that primary key. This technique is shown in the following PHP code:

<?php 
＄bug_id_list = ＄pdo->query("SELECT bug_id FROM Bugs")->fetchAll();
＄rand = random( count(＄bug_id_list) );
＄rand_bug_id = ＄bug_id_list[＄rand]["bug_id"];
＄stmt = ＄pdo->prepare("SELECT * FROM Bugs WHERE bug_id = ?"); ＄stmt->execute( array(＄rand_bug_id) );
＄rand_bug = ＄stmt->fetch();
?>

This avoids sorting the table, and the chance of choosing each key value is approximately equal, but this solution has other costs:

Fetching all the bug_id values from the database might return a list of impractical sizes. It can even exceed application memory resources and cause an error such as the following:

Fatal error: The allowed memory size of 16777216 bytes exhausted.
The query must be run twice: once to produce the list of primary keys and a second time to fetch the random row. If the query is too complex and costly, this is a problem.

We can use this solution when we’re selecting a random row from a simple query with a moderately sized result set. This solution is good for choosing from a list of noncontiguous values.

Choose a random row using an offset#

Still another technique that avoids the problems found in the preceding alternatives is to count the rows in the data set and return a random number between 0 and the count. Then, we can use this number as an offset when querying the data set.

<?php 
＄rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
＄offset = ＄pdo->query(＄rand)->fetch(PDO::FETCH_ASSOC);
＄sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
＄stmt = ＄pdo->prepare(＄sql);
＄stmt->execute( ＄offset );
＄rand_bug = ＄stmt->fetch();
?>

This solution relies on the non-standard LIMIT clause, supported by MySQL, PostgreSQL, and SQLite.

An alternative that uses the ROW_NUMBER() window function works in Oracle, Microsoft SQL Server, and IBM DB2.

For example, here’s the solution in Oracle:

<?php 
＄rand = "SELECT 1 + MOD(ABS(dbms_random.random()), (SELECT COUNT(*) FROM Bugs)) AS offset FROM dual";
＄offset = ＄pdo->query(＄rand)->fetch(PDO::FETCH_ASSOC);
＄sql = "WITH NumberedBugs AS ( SELECT b.*, ROW_NUMBER() OVER (ORDER BY bug_id) AS RN FROM Bugs b ) SELECT * FROM NumberedBugs WHERE RN = :offset";
＄stmt = ＄pdo->prepare(＄sql);
＄stmt->execute( ＄offset );
＄rand_bug = ＄stmt->fetch();
?>

We can use this solution when we can’t assume contiguous key values, and we need to make sure each row has an even chance of being selected.

Proprietary solutions#

Any given brand of database might implement its own solution for this kind of task. For example, Microsoft SQL Server 2005 added a TABLESAMPLE clause:

Enter to Rename, Shift+Enter to Preview

Random selection using TABLESAMPLE in Microsoft SQL Server 2005

Oracle uses a slightly different SAMPLE clause, for example, to return 1 percent of the rows in the table:

Enter to Rename, Shift+Enter to Preview

Random selection using SAMPLE in Oracle

We should read the documentation for the appropriate solution in our brand of database. There are often limitations or other options that we need to know about.

Antipattern: Sort Data Randomly

Synopsis: Poor Man’s Search Engine

Mark as Completed

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Solution: In No Particular Order

Choose a random key-value between 1 and MAX#

Choose the next highest key value#

Get a list of all key values, choose one at random#

Choose a random row using an offset#

Proprietary solutions#